High-dimensional approximate nearest neighbor: k-d Generalized Randomized Forests
نویسندگان
چکیده
We propose a new data-structure, the generalized randomized k -d forest, or k -d GeRaF, for approximate nearest neighbor searching in high dimensions. In particular, we introduce new randomization techniques to specify a set of independently constructed trees where search is performed simultaneously, hence increasing accuracy. We omit backtracking, and we optimize distance computations, thus accelerating queries. We release public domain software GeRaF and we compare it to existing implementations of state-of-the-art methods including BBD-trees, Locality Sensitive Hashing, randomized k -d forests, and product quantization. Experimental results indicate that our method would be the method of choice in dimensions around 1,000, and probably up to 10,000, and pointsets of cardinality up to a few hundred thousands or even one million; this range of inputs is encountered in many critical applications today. For instance, we handle a real dataset of 10 images represented in 960 dimensions with a query time of less than 1sec on average and 90% responses being true nearest neighbors.
منابع مشابه
Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration
For many computer vision problems, the most time consuming component consists of nearest neighbor matching in high-dimensional spaces. There are no known exact algorithms for solving these high-dimensional problems that are faster than linear search. Approximate algorithms are known to provide large speedups with only minor loss in accuracy, but many such algorithms have been published with onl...
متن کاملHigh Dimensional Similarity Search With Space Filling Curves
We present a new approach for approximate nearest neighbor queries for sets of high dimensional points under any L t-metric, t = 1; : : : ; 1. The proposed algorithm is eecient and simple to implement. The algorithm uses multiple shifted copies of the data points and stores them in up to (d + 1) B-trees where d is the dimensionality of the data, sorted according to their position along a space ...
متن کاملAdaptively Discovering Meaningful Patterns in High-Dimensional Nearest Neighbor Search
To query high-dimensional databases, similarity search (or k nearest neighbor search) is the most extensively used method. However, since each attribute of high dimensional data records only contains very small amount of information, the distance of two high-dimensional records may not always correctly reflect their similarity. So, a multi-dimensional query may have a k-nearest-neighbor set whi...
متن کاملGraph-based time-space trade-offs for approximate near neighbors
We take a first step towards a rigorous asymptotic analysis of graph-based approaches for finding (approximate) nearest neighbors in high-dimensional spaces, by analyzing the complexity of (randomized) greedy walks on the approximate near neighbor graph. For random data sets of size n = 2o(d) on the d-dimensional Euclidean unit sphere, using near neighbor graphs we can provably solve the approx...
متن کاملEFFECT OF THE NEXT-NEAREST NEIGHBOR INTERACTION ON THE ORDER-DISORDER PHASE TRANSITION
In this work, one and two-dimensional lattices are studied theoretically by a statistical mechanical approach. The nearest and next-nearest neighbor interactions are both taken into account, and the approximate thermodynamic properties of the lattices are calculated. The results of our calculations show that: (1) even though the next-nearest neighbor interaction may have an insignificant ef...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1603.09596 شماره
صفحات -
تاریخ انتشار 2016